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ABSTRACT 

Triple  Modular  Redundancy  (TMR)  is  a classical  technique  for  improving 
the  reliability  of  digital  systems.  However,  applying  TMR  to  microcomputer 
systems  may  not  improve  overall  system  reliability  because  voter  circiits 
may  contribute  as  much  to  system  unreliability  as  the  microprocessors 
themselves.  We  examine  the  issues  that  affect  the  effectiveness  of  TMR  for 
microcomputer  systems,  including  voter  unreliability,  considerations  for 
transient  recovery,  and  reliability  of  semiconductor  memory  systems.  With 
careful  application  TMR  can  improve  the  mission  time  of  a small  system  by 
a factor  of  three  or  more. 
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1 . INTRODUCTION 


Triple  modular  redundancy  (TMR)  has  been  used  in  a number  of 
systems  to  increase  reliability  for  highly  critical  applications 
[1,2].  TMR  is  applied  to  a nonredundant  system  by  partitioning  the 
system  into  a number  of  modules,  triplicating  the  modules,  and  placing 
majority  voters  at  the  Interfaces  between  modules.  In  a TMR  system, 
errors  produced  by  any  single  faulty  module  are  masked  by  a simple 
majority  vote.  As  shown  in  Fig.  1,  the  effects  of  single  voter 
failures  can  be  overcome  by  triplicating  the  voters.  There  are  no 
critical  single-point  failures  in  the  system  of  Fig.  1,  that  is,  the 
system  will  continue  correct  operation  in  spite  of  any  single  module 
or  voter  failures. 

For  reasons  of  cost  TMR  in  the  past  has  only  been  applied  to  systems 
for  highly  critical  applications.  However,  the  decreasing  cost  of 
computer  processor  and  memory  hardware  is  increasing  the  feasibility  of 
TMR  as  a means  of  improving  reliability  in  general-purpose  systems.  Of 
course,  for  some  systems  it  can  be  argued  that  improving  processor  and 
memory  reliability  is  of  minor  importance  because  most  failures  are 
attributable  to  peripherals  and  input/output  subsystems.  However,  in 
addition  to  the  fact  that  peripheral  and  Input/output  reliability  have 
been  studied  elsewhere  [3],  there  is  a strong  argument  to  support  the 
development  of  reliable  processors  and  memories.  In  many  situations  the 
most  practical  way  to  Increase  system  reliability  is  by  providing  standby 
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sparps  that  can  be  activated  automatically  upon  failure  of  primary 
units  [l,A,5].  Obviously  sparing  schemes  can  increase  reliability 
only  if  the  error  detection  and  automatic  switching  mechanisms  are 
themselves  very  reliable  [6].  Hence  the  development  of  inexpensive 
ultra-reliable  processors  can  provide  a new  way  of  implementing 
"test  and  repair"  functions  for  any  system  with  spares  (digital  or 
otherwise) . 

The  thought  of  applying  TMR  to  microcomputer  systems  raises 
some  interesting  questions.  First  of  all,  since  a microprocessor  Is 
just  a single  chip,  it  Is  not  clear  that  reliability  can  really  be 
Increased  In  a system  that  must  use  many  voter  chips  constructed  from 
the  same  unreliable  technology  as  the  microprocessor  Itself.  Secondly, 
a microprocessor  Is  a rather  complex  sequential  machine  with  only 
limited  access  to  Its  Internal  state.  VIhen  a transient  failure  causes 
one  of  the  replicated  microprocessors  to  get  out  of  synchronization 
with  the  others.  It  Is  not  clear  that  the  system  will  ever  be  resynchronized 
so  that  additional  transients  can  be  tolerated.  Thirdly,  the  reliability 
of  semiconductor  memory  systems  associated  with  microprocessors  saist  be 
considered. 
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2.  MICROCOMPUTER  SYSTEM  MODEL 

We  will  use  the  simple  model  of  a microcomputer  system  shown 
in  Fig.  2.  The  system  consists  of  simply  a microprocessor  and  a 
memory  containing  programs  and  data.  Data,  address,  and  control 
outputs  of  the  microprocessor  are  connected  to  the  memory;  data  and 
control  outputs  of  the  memory  are  connected  to  the  microprocessor. 
Connections  to  peripherals  are  Ignored;  for  the  TMR  system  It  is 
assumed  that  each  peripheral  Interface  has  voters  which  monitor  the  I/O 
commands  given  by  all  three  triplicated  processors. 

A typical  LSI  microprocessor  Is  the  Intel  8080  [?].  The  8080 
Is  an  8-blt  processor  In  a 40-pln  package.  It  has  16  address  lines, 
an  8-blt  bidirectional  data  bus,  and  9 control  lines  entering  and 
leaving  the  chip.  The  data  bus  must  be  externally  split  Into  two 
one-way  buses  for  voting  to  be  applied,  and  hence  there  are  a total 
of  41  lines  In  an  8080  system  that  could  be  voted  on.  Since  three 
voter  circuits  (Fig.  3)  can  be  placed  on  a single  14-pln  package,  It 
Is  conceivable  that  a TMR  8080  system  could  have  3 8080  packages  and 
41  voter  packages  (triplicated  voters)  or  14  voter  packages  (non- 
trlpllcated  voters) . Since  a large  percentage  of  Integrated  circuit 
failures  are  related  to  problems  In  packaging  and  I/O  pins  rather  than 
circuit  complexity.  It  Is  quite  conceivable  that  the  total  voter  un- 
reliability In  a TMR  microcomputer  system  could  approach  or  even  exceed 
the  microprocessor  unreliability.  In  such  a system  the  use  of  TMR  could 
actually  decrease  the  overall  system  reliability.  After  Introducing  some 
reliability  concepts,  we  will  give  a simple  analysis  that  shows  this. 
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3.  RELIABILITY  CONCEPTS 


The  reliability  of  a component  or  system  is  a function  of  time 
R(t),  the  probability  that  the  component  or  system  has  not  failed 
at  time  t.  For  individual  components  of  an  electronic  system  it  is 
commonly  assumed  that  failures  after  burn-in  have  a Poisson  distri- 
bution, so  that  the  reliability  of  a component  is  given  by  the 
formula  R(t)  “ e The  parameter  A depends  on  the  component  and 

Is  called  the  failure  rate.  Typical  SSI  circuits  have  failure  rates 
—6  —S 

of  10  to  10  failures/hour,  while  failure  rates  of  LSI  circuits 
such  as  1024-bit  memories  have  been  reported  In  the  range  10  ^ to 
10  ^ failures/hour  [8,9]. 

Individual  component  failures  in  a system  are  assumed  to  be  in- 
dependent, so  that  the  system  reliability  is  the  product  of  the 
component  reliabilities.  For  example,  in  a system  composed  of  n 
identical  components  with  failure  rate  A^,  the  system  reliability  is 

R (t)  ■ (e  « g ^syst  v,here  A - nA  . 

sys  ' ' ' sys  (- 

Explicit  identification  of  the  time  dependence  of  reliability  is 
often  omitted  in  reliability  expressions.  Hence  reliability  is  indicated 
simply  by  R,  and  it  is  understood  that  the  reliability  at  some  time  t 
can  be  obtained  by  substituting  the  value  of  R(t)  for  every  occurrence 
of  R in  an  expression. 

For  complex  systems  it  is  useful  to  have  a single  number  that 
characterises  the  system  reliability  rather  than  a continuous  function 
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()1  time.  Sometimes  the  mean  time  to  failure  (MTF)  Is  used  to  provide 

this  characterization.  The  MTF  Is  defined  as  the  integral  from  time 

equals  0 to  infinity  of  the  system  reliability.  For  components  the 

MTF  is  therefore  simply  the  inverse  of  the  failure  rate;  for  any 

system  the  MTF  can  be  derived  from  the  reliability  expression.  A 

parameter  that  has  been  found  to  be  more  useful  than  MTF  for  evaluating 

ultra-reliable  systems  Is  the  mission  time.  The  mission  time  for  a 

system  with  reliability  R (t)  Is  defined  to  be  the  value  of  t such  that 

sys 

= R^,  where  R^  is  some  predetermined  final  reliability.  The 
value  used  f depends  on  the  application  but  a typical  value  is  .95. 

The  mlssl  icates  the  amount  of  time  it  takes  for  the  reliability 

of  an  IniLiuily  perfect  system  to  degrade  to  R^.  [lO]. 

In  comparing  ultra-reliable  systems  with  each  other  and  with  non- 
redundant  systems,  the  mission  time  Improvement  factor  (MTIF)  Is  often 
used.  The  MTIF  is  the  ratio  of  the  mission  times  of  a redundant  system 
and  the  corresponding  nonredundant  system  [lO], 

The  reliability  of  a TMR  system  can  be  calculated  by  partitioning 
the  system  Into  a number  of  cells  such  that  errors  on  the  outputs  of  a 
cell  are  corrected  by  voters  at  the  Inputs  of  subsequent  cells  [ll],  as 
indicated  for  the  simple  system  of  Fig.  1.  Then  the  Individual  cell 
reliabilities  are  calculated,  where  a cell  is  considered  to  be  operating 
correctly  If  at  least  two  out  of  three  of  each  of  Its  triplicated  output 
lines  Is  correct.  The  system  operates  correctly  if  and  only  If  each  cell 
operates  correctly,  and  so  the  system  reliability  Is  the  product  of  the 


cell  reliabilities. 
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The  simplest  type  of  cell  has  one  triplicated  module  type  and 

voters  at  the  Inputs  of  the  modules;  two  of  these  cells  comprise  the 

system  in  Fig.  1.  If  R Is  the  voter  reliability  and  R is  the  module 

V m 

reliability,  then  the  cell  reliability  is  R » (R  R )^  + 3(R  R )^(1-R  K ), 

cell  m V m V m v 

since  two  out  of  three  of  the  cell  outputs  are  correct  if  and  only  if  two 
out  of  three  of  the  voter-module  pairs  are  working  correctly.  If  there 
are  n module  Inputs  then  n voters  are  used  for  each  module  and  r”  replaces 
R in  the  expression  above. 

V 


/*.  TMR  MICROCOMPUTER  SYSTEM  UNRELIABILITY 


As  we  indicated  in  section  2,  a typical  microprocessor  might 
have  40  or  more  lines  to  be  voted  upon  when  TMR  is  applied.  In  a 
small  system  consisting  of  a microprocessor  and  a small  number  of 
memory  circuits,  the  voter  unreliability  could  be  greater  than  the 
microprocessor  and  memory  unreliability.  Suppose  the  reliability  of 
the  microprocessor/memory  module  is  and  the  reliability  of  a single 
voter  is  R^.  If  n voters  are  required  then  the  total  voter  reliability 
is  r”,  and  this  can  be  related  to  the  module  reliability  by  a factor 

Tl  Ic 

k such  that  R - R . The  factor  k could  be  in  the  range  .1  (very 
V m 

reliable  voters)  to  2 or  more  (voter  reliability  per  pin  comparable  to 

microprocessor  reliability).  For  example,  suppose  a microcomputer  system 

uses  one  microprocessor  and  four  memory  chips,  each  with  failure  rate 

A - 10~^.  If  the  voter  failure  rate  is  A » 10  then  40  voters 
m V 

produce  a value  of  k of  .8. 

A simple  reliability  analysis  of  the  TMR  microcomputer  indicates 
that  the  system  functions  properly  if  at  least  two  out  of  three  of  the 
replicated  voter/module  subs  stems  function  properly.  Hence  the  TMR 
system  reliability  is 


R - (R  r")^  + (R  r")^(1-R  r") 
ays  m V m v m v 


(R^'*’*')^  + (R  ^■'■*^)^(1-R^'*'*^)  . 
m m m 


The  reliability  of  the  nonredundant  system  is  simply  R . The  mission 

n 

time  Improvement  factor  (KTIF)  for  the  TMR  system  can  be  calculated  as 
a function  of  k,  as  shown  in  Fig.  3a  for  a final  reliability  R^  “ .95. 
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For  the  perfect  voter  case  (k=0) , the  theoretical  maximum  MTIF  of 
2.8A  is  obtained,  but  for  imperfect  voters  (k>0)  the  MTIF  can  be 
much  less.  For  example,  if  the  module  and  total  voter  unreliabilities 
are  equal  (k=l),  the  MTIF  is  only  1.A2,  and  for  k=2  the  redundant  system 
actually  has  a lower  mission  time  than  the  nonredundant  system. 

The  preceding  analysis  shows  that  voter  reliability  can  be  a 
critical  factor  in  TMR  microcomputer  systems.  One  way  to  reduce  the 
effects  of  voter  unreliability  is  to  reduce  the  number  of  voters.  For 
example,  a triplicated  microprocessor/memory  system  could  be  designed 
with  no  voters  at  all.  The  three  copies  would  be  initialized  to  the 
same  starting  state  and  would  run  in  synchronization  from  a common  (fault- 
tolerant)  clock.  Since  the  peripherals  are  assumed  to  have  their  own 
voters,  each  peripheral  would  monitor  the  I/O  commands  of  all  three  copies 
and  would  perform  the  operations  dictated  by  the  majority.  However, 
consider  the  behavior  of  this  system  in  the  presence  of  transient  failures. 

A transient  failure  can  cause  a microprocessor  to  get  out  of  synchronization 
with  the  others,  and  a second  transient  can  cause  system  failure  unless 
the  microprocessor  is  resynchronized.  The  problem  with  the  no-voter  scheme 
is  chat  there  is  no  coupling  among  Che  replicated  microprocessor/memory 
systems,  and  hence  there  Is  no  mechanism  for  resynchronization  after 
transients.  In  the  next  section  we  present  a system  organization  that  has 
the  minimum  number  of  voters  required  for  resynchronization  after  transients. 


■j.  SYSTEM  STRUCTURE  FOR  RESYNCHRONIZATION 


A transient  failure  can  have  an  arbitrary  effect  on  the  state  of 
a microprocessor,  and  after  the  transient  disappears  the  affected 
processor  may  continue  to  have  the  incorrect  state.  If  a second 
transient  failure  affects  a different  processor  before  the  correct 
state  of  the  first  is  restored,  then  two  processors  will  produce 
incorrect  outputs  and  the  TMR  system  will  fail.  This  certainly  runs 
contrary  to  the  desire  to  make  the  system  tolerate  short  transients 
by  the  use  of  TMR.  For  multiple  transients  to  be  tolerated,  the  system 
must  be  structured  so  that  each  replicated  processor  frequently  receives 
a synchronizing  sequence  during  normal  operation  [l2]. 

Suppose  that  voters  are  placed  at  the  master  reset  input  and  the 
data  Inputs  of  each  microprocessor,  as  shown  in  Fig.  4.  The  address, 
data  out,  and  control  lines  of  each  microprocessor  go  directly  to  the 
corresponding  memory  module  without  any  voting.  This  configuration 
has  the  minimum  number  of  voters  needed  to  provide  re-synchronlzatlon 
after  transient  failures.  For  example,  suppose  a transient  failure 
causes  several  registers  of  one  microprocessor  and  several  words  in  the 
corresponding  memory  module  to  contain  Incorrect  data.  Each  of  the  in- 
correct registers  is  resynchronized  with  correct  data  when  It  Is  loaded 
from  memory,  since  the  voters  Insure  correct  memory  output  regardless  of 
any  possible  errors  In  the  state  of  one  of  the  memories.  Once  the  micro- 
processor Is  resynchronized,  the  memory  Is  resynchronized  by  loading  the 
Incorrect  memory  trords  from  the  microprocessor. 
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Of  course,  it  is  possible  that  a transient  failure  can  affect  not 
only  the  register  state  but  also  the  program  state  of  a microprocessor. 

In  general  the  microprocessor  can  attain  any  erroneous  state  and  before 
being  resynchronized  it  can  create  arbitrary  errors  in  the  corresponding 
memory  module.  It  is  this  possibility  that  necessitates  a voter  on  the 
master  reset  line  of  the  microprocessors.  Associated  with  each  micro- 
processor is  some  interface  circuitry  that  can  be  Instructed  by  the  soft- 
ware to  initiate  a hardware  reset.  Periodically  the  software  would  cause 
such  a reset  to  occur,  and  since  the  reset  line  is  voted  on,  a completely 
unsynchronized  microprocessor  must  still  obey  the  reset  command.  The 
reset  command  causes  the  microprocessor  to  begin  executing  a routine  at 
some  fixed  location.  The  routine  in  this  case  must  be  a synchronizing 
routine  that  first  initializes  all  of  the  processor  registers  from  memory, 
and  then  corrects  any  possible  errors  in  a single  memory  module  by 
sequentially  reading  and  then  rewriting  every  word  in  the  memory. 

There  are  certain  hardware/software  tradeoffs  involving  synchronization. 
For  example,  if  voters  are  placed  on  the  address,  data  and  control  lines 
between  the  microprocessor  output  and  memory  Input,  then  a single  erroneous 
processor  cannot  cause  bad  data  to  be  written  into  the  corresponding 
memory  module.  Thus  the  software  resynchronization  process  need  not 
W assume  the  worst  case,  that  arbitrary  errors  have  been  created  in  the 

memory.  On  the  other  hand,  the  voting  hardware  is  more  expensive  and 
unreliable. 

An  alternative  to  the  system  structure  of  Fig.  4 places  voters  on 
the  data  Inputs  lines  to  the  memory  rather  than  on  the  data  output.  This 
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structure  still  allows  synchronization,  since  an  error  In  a piTOcessor 
register  will  be  masked  when  It  is  written  into  memory,  and  memory 
can  still  be  re-initialized  by  reading  and  then  rewriting  every  memory 
word.  This  structure  might  even  seem  better  than  Fig.  4 because  it 
prevents  a single ' faulty  processor  from  writing  incorrect  data  into 
memory.  However,  the  structure  of  Fig.  4,  which  places  voters  on  the 
memory  outputs  rallier  tli.m  inputs  yields  significantly  higher  reliability 
wlieii  .j  semi  < ondui  t or  nn’iiiory  system  is  used. 
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6.  SEMICONDUCTOR  MEMORY  RELIABILITY 

The  semiconductor  memory  module  of  a microcomputer  system  can  be 

modeled  as  shown  in  Fig.  5.  There  is  some  shared  address  decoding 

and  driving  circuitry,  an  array  of  memory  chips,  and  perhaps  some 

shared  output  circuitry.  The  memory  array  consists  of  ns  1-bit  by  w-word 

memory  chips  arranged  in  an  nxs  matrix  to  form  the  n-bit  by  ws-word 

array.  If  the  memory  chip  reliability  is  and  the  reliability  of  the 

common  circuitry  Is  R.,  then  module  reliability  is  r"^*R.  and  it  would 

d c d 

appear  that  the  reliability  of  a TMR  memory  system  is 

R = (r"®r.)^  + 3(r"®R,)^  (i-r"V)  . (1) 

sys  c d c d c d 

The  above  analysis  neglects  the  organization  of  the  memory  array. 

In  a system  such  as  Fig.  4 where  there  is  a voter  for  each  bit  of  the 
memory  output,  the  system  falls  only  if  there  is  a simultaneous  error 
in  a single  bit  position  of  two  of  the  triplicated  memory  modules.  Con- 
sideration of  the  memory  array  structure  hence  leads  to  the  more  accurate 
reliability  formula, 

R - Rj(3R^  - 2R^)"®  + 3R^(1-R.)R^"®  (2) 

sys  d c c d d c 

This  expression  reflects  Che  fact  that  at  each  position  in  Che  array  of 

Fig.  5,  two  out  of  three  of  the  replicated  memory  chips  must  be  working, 

independent  of  other  positions  in  the  array. 

The  reliability  expression  above  always  produces  a reliability  value 

greater  than  or  equal  to  (1).  The  Improvement  obtained  by  using  (2) 

decreases  as  the  reliability  of  the  memory  array  (R  ) relative  to  the 

c 

common  circuitry  (R.)  increases.  For  example,  if  R “1  the  formulas  are 
d c 
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ideiuical.  But  tor  typical  semiconductor  memory  systems,  the  c i)mmon 
circuitry  comprises  only  about  10-15%  of  the  total,  and  so  the 
reliability  value  obtained  by  considering  the  structure  of  the  memory 
array  (2)  is  signi t icantly  higher  than  that  obtained  by  simple  analysis 
(1).  A typical  example  is  shown  in  Fig.  6. 

The  TMR  memory  tel  lability  indicated  by  (2)  is  more  accurate  than  » 

(1),  but  it  is  still  not  complete.  A complete  memory  system  analysis 
must  be  somewhat  more  complex,  taking  into  account  voter  reliability,  the 
placement  of  voters  for  the  memory  system  Inputs,  and  the  possibility  of 
having  different  chip  types  within  the  memory  array.  For  example,  (2) 
may  be  modified  to  take  into  account  voter  reliability,  yielding  the 
expression, 

R = R^[r^(R^  + 3R^(1-R  ))®  + 3R^(1-R  )R^®]'' 
sys  d vc  c c V vc 

+ 3rJ(1-R.)(R  (3) 

d dvc 

The  reliability  improvement  of  Eqn.  (2)  and  (3)  over  Eqn.  (1)  is  only 
obtained  when  there  are  voters  on  the  memory  outputs.  If  voting  is  applied 
after  data  has  been  routed  through  a processor,  then  (2)  and  (3)  do  not 
apply.  In  such  a system,  a single  bit  error  in  the  memory  output  can 
produce  multiple  bit  errors  in  the  resulting  processor  outputs,  invalidating 
the  assumption  used  in  deriving  (2)  and  (3). 


The  reliability  of  a TMR  memory  system  should  be  compared  with  a 
memory  system  that  attains  single  fault-tolerance  by  using  a single-error- 
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correcting  code.  Both  systems  are  guaranteed  to  correct  any  single 
failure  in  the  memory  array,  but  analysis  has  shown  that  the  TMR 
system  is  more  reliable  because  it  corrects  a larger  number  of  multiple 
failures.  For  an  8-blt  memory  system,  coding  requires  k redundant 
memory  bits  per  word  while  TMR  requires  16.  On  the  other  hand,  the 
coded  system  requires  a separate  copy  of  the  common  input  circuitry 
(Fig.  5)  for  each  bit  t tolerate  single  failures  in  the  common  circuitry. 
In  addition,  the  output  decoder  for  the  coded  system  is  much  more  complex 
than  a few  TMR  voters,  and  it  must  be  triplicated  if  the  memory  system 
is  being  interfaced  to  a TMR  processor,  or  in  any  case  duplicated  if 
decoder  failures  are  to  at  least  be  detected.  Hence  for  small  fault- 
tolerant  memory  system  that  are  to  be  interfaced  to  a TMR  processor, 

TMR  appears  to  be  a much  better  choice  than  coding  [l3]. 
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7.  I'MR  MICROCOMl'UTER  SYSTEM  RELIABILITY 


The  reliability  of  the  TMR  microcomputer  system  of  Fig,  4 can 
be  analyzed  by  using  Eqn.  (3)  from  the  previous  section,  by  Including 
the  microprocessor  reliability  as  part  of  the  common  circuitry  term 
R^.  The  reliability  of  a system  with  no  voters  or  with  voters  between 
the  CPU  output  and  memory  input  can  be  derived  using  Eqn.  (1).  Fig.  7 
shows  the  reliability  of  these  three  possible  TMR  implementations  of  a 
nonredundant  microcomputer  with  IK  bytes  of  memory  using  typical  failure 
rates.  All  three  TMR  systems  have  higher  reliability  than  the  non- 
redundant system,  and  Improve  the  mission  time  by  a factor  of  about  3. 

Among  the  TMR  systems,  the  implementation  with  voters  at  the  memory  output 
(TMR  CPU-memory-voter)  is  most  reliable,  for  the  reasons  discussed  in  the 
previous  section.  The  system  with  voters  between  CPU  output  and  memory 
input  (TMR  CPU-voter-memory)  is  less  reliable  than  a system  with  no  voters 
because  of  voter  unreliability.  However,  the  CPU-voter-memory  system  is 
actually  more  reliable  if  transients  are  considered  because  of  its  ability 
for  resynchronlzatlon. 

The  reliability  curves  for  similar  implementations  of  a system  with 
more  memory  (8K  bytes)  are  shown  in  Fig.  8.  It  can  be  seen  that  in  this 
case  there  is  little  difference  between  the  no-voter  and  CPU-voter-memory 
implementations  because  the  major  contribution  to  system  unreliability  is 
from  the  memory  chips.  However,  a substantial  improvement  over  these 
implementations  is  obtained  In  the  CPU-memory-voter  implementation,  because 
of  the  greater  number  of  memory  failures  tolerated. 


PAOa  BLaNK-NOT  FIIMED 
i'-  • 


28  - 


8.  CONCLUSION 


Triple  modular  redundancy.  If  carefully  applied,  Is  an  effective 
way  of  increasing  the  reliability  of  microcomputer  systems.  Application 
of  TMR  to  microcomputer  systems  must  take  into  account  the  fact  that 
voters  may  be  as  unreliable  as  microprocessors  themselves,  that  micro- 
processors are  complex  sequential  machines  that  require  resynchronization 
after  transients,  and  that  special  considerations  apply  to  reliability  of 
semiconductor  memory  systems  used  with  microprocessors.  We  have  shown 
two  examples  of  small  microcomputer  systems  in  which  TMR  improves  the 


mission  time  by  a factor  of  three  or  more. 
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